45 research outputs found

    Montgomery Multiplication on the Cell

    Get PDF
    A technique to speed up Montgomery multiplication targeted at the Synergistic Processor Elements (SPE) of the Cell Broadband Engine is proposed. The technique consists of splitting a number into four consecutive parts. These parts are placed one by one in each of the four element positions of a vector, representing columns in a 4-SIMD organization. This representation enables arithmetic to be performed in a 4-SIMD fashion. An implementation of the Montgomery multiplication using this technique is up to 2.47 times faster compared to an unrolled implementation of Montgomery multiplication, which is part of the IBM multi-precision math library, for odd moduli of length 160 to 2048 bits. The presented technique can also be applied to speed up Montgomery multiplication on other SIMD-architectures

    Performance Analysis of the SHA-3 Candidates on Exotic Multi-core Architectures

    Get PDF
    The NIST hash function competition to design a new cryptographic hash standard 'SHA-3' is currently one of the hot topics in cryptologic research, its outcome heavily depends on the public evaluation of the remaining 14 candidates. There have been several cryptanalytic efforts to evaluate the security of these hash functions. Concurrently, invaluable benchmarking efforts have been made to measure the performance of the candidates on multiple architectures. In this paper we contribute to the latter; we evaluate the performance of all second-round SHA-3 candidates on two exotic platforms: the Cell Broadband Engine (Cell) and the NVIDIA Graphics Processing Units (GPUs). Firstly, we give performance estimates for each candidate based on the number of arithmetic instructions, which can be used as a starting point for evaluating the performance of the SHA-3 candidates on various platforms. Secondly, we use these generic estimates and Cell-/GPU-specific optimization techniques to give more precise figures for our target platforms, and finally, we present implementation results of all 10 non-AES based SHA-3 candidates

    Imatinib in patients with severe COVID-19: a randomised, double-blind, placebo-controlled, clinical trial

    Get PDF
    Background The major complication of COVID-19 is hypoxaemic respiratory failure from capillary leak and alveolar oedema. Experimental and early clinical data suggest that the tyrosine-kinase inhibitor imatinib reverses pulmonary capillary leak.Methods This randomised, double-blind, placebo-controlled, clinical trial was done at 13 academic and non-academic teaching hospitals in the Netherlands. Hospitalised patients (aged >= 18 years) with COVID-19, as confirmed by an RT-PCR test for SARS-CoV-2, requiring supplemental oxygen to maintain a peripheral oxygen saturation of greater than 94% were eligible. Patients were excluded if they had severe pre-existing pulmonary disease, had pre-existing heart failure, had undergone active treatment of a haematological or non-haematological malignancy in the previous 12 months, had cytopenia, or were receiving concomitant treatment with medication known to strongly interact with imatinib. Patients were randomly assigned (1:1) to receive either oral imatinib, given as a loading dose of 800 mg on day 0 followed by 400 mg daily on days 1-9, or placebo. Randomisation was done with a computer-based clinical data management platform with variable block sizes (containing two, four, or six patients), stratified by study site. The primary outcome was time to discontinuation of mechanical ventilation and supplemental oxygen for more than 48 consecutive hours, while being alive during a 28-day period. Secondary outcomes included safety, mortality at 28 days, and the need for invasive mechanical ventilation. All efficacy and safety analyses were done in all randomised patients who had received at least one dose of study medication (modified intention-to-treat population). This study is registered with the EU Clinical Trials Register (EudraCT 2020-001236-10).Findings Between March 31, 2020, and Jan 4, 2021, 805 patients were screened, of whom 400 were eligible and randomly assigned to the imatinib group (n=204) or the placebo group (n=196). A total of 385 (96%) patients (median age 64 years [IQR 56-73]) received at least one dose of study medication and were included in the modified intention-to-treat population. Time to discontinuation of ventilation and supplemental oxygen for more than 48 h was not significantly different between the two groups (unadjusted hazard ratio [HR] 0.95 [95% CI 0.76-1.20]). At day 28, 15 (8%) of 197 patients had died in the imatinib group compared with 27 (14%) of 188 patients in the placebo group (unadjusted HR 0.51 [0.27-0.95]). After adjusting for baseline imbalances between the two groups (sex, obesity, diabetes, and cardiovascular disease) the HR for mortality was 0.52 (95% CI 0.26-1.05). The HR for mechanical ventilation in the imatinib group compared with the placebo group was 1.07 (0.63-1.80; p=0.81). The median duration of invasive mechanical ventilation was 7 days (IQR 3-13) in the imatinib group compared with 12 days (6-20) in the placebo group (p=0.0080). 91 (46%) of 197 patients in the imatinib group and 82 (44%) of 188 patients in the placebo group had at least one grade 3 or higher adverse event. The safety evaluation revealed no imatinib-associated adverse events.Interpretation The study failed to meet its primary outcome, as imatinib did not reduce the time to discontinuation of ventilation and supplemental oxygen for more than 48 consecutive hours in patients with COVID-19 requiring supplemental oxygen. The observed effects on survival (although attenuated after adjustment for baseline imbalances) and duration of mechanical ventilation suggest that imatinib might confer clinical benefit in hospitalised patients with COVID-19, but further studies are required to validate these findings. Copyright (C) 2021 Elsevier Ltd. All rights reserved.Pathogenesis and treatment of chronic pulmonary disease

    NASB: Neural Architecture Search for Binary Convolutional Neural Networks

    No full text
    Binary Convolutional Neural Networks (CNNs) have significantly reduced the number of arithmetic operations and the size of memory storage needed for CNNs, which makes their deployment on mobile and embedded systems more feasible. However, after binarization, the CNN architecture has to be redesigned and refined significantly due to two reasons: 1. the large accumulation error of binarization in the forward propagation, and 2. the severe gradient mismatch problem of binarization in the backward propagation. Even though substantial effort has been invested in designing architectures for single and multiple binary CNNs, it is still difficult to find an optimized architecture for binary CNNs. In this paper, we propose a strategy, named NASB, which adapts Neural Architecture Search (NAS) to find an optimized architecture for the binarization of CNNs. In the NASB strategy, the operations and their connections define a unique searching space and the training and binarization of the network progress in the three-stage training algorithm. 1 Due to the flexibility of this automated strategy, the obtained architecture is not only suitable for binarization but also has low overhead, achieving a better trade-off between the accuracy and computational complexity compared to hand-optimized binary CNNs. The implementation of the NASB strategy is evaluated on the ImageNet dataset and demonstrated as a better solution compared to existing quantized CNNs. With insignificant overhead increase, NASB outperforms existing single and multiple binary CNNs by up to 4.0% and 1.0% Top-1 accuracy respectively, bringing them closer to the precision of their full precision counterpart

    Timed circuit verification using TEL structures

    No full text

    ReAF: Reducing approximation of channels by reducing feature reuse within convolution

    No full text
    High-level feature maps of Convolutional Neural Networks are computed by reusing their corresponding low-level feature maps, which brings into full play feature reuse to improve the computational efficiency. This form of feature reuse is referred to as feature reuse between convolutional layers. The second type of feature reuse is referred to as feature reuse within the convolution, where the channels of the output feature maps of the convolution are computed by reusing the same channels of the input feature maps, which results in an approximation of the channels of the output feature maps. To compute them accurately, we need specialized input feature maps for every channel of the output feature maps. In this paper, we first discuss the approximation problem introduced by full feature reuse within the convolution and then propose a new feature reuse scheme called Reducing Approximation of channels by Reducing Feature reuse (REAF). The paper also shows that group convolution is a special case of our REAF scheme and we analyze the advantage of REAF compared to such group convolution. Moreover, we develop the REAF+ scheme and integrate it with group convolution-based models. Compared with baselines, experiments on image classification demonstrate the effectiveness of our REAF and REAF+ schemes. Under the given computational complexity budget, the Top-1 accuracy of REAF-ResNet50 and REAF+-MobileNetV2 on ImageNet will increase by 0.37% and 0.69% respectively. The code and pre-trained models will be publicly available.Computer Engineerin

    A matrix-multiply unit for posits in reconfigurable logic leveraging (Open)CAPI

    No full text
    In this paper, we present the design in reconfigurable logic of a matrix multiplier for matrices of 32-bit posit numbers with es=2 [1]. Vector dot products are computed without intermediate rounding as suggested by the proposed posit standard to maximally retain precision. An initial implementation targets the CAPI 1.0 interface on the POWER8 processor and achieves about 10Gpops (Giga posit operations per second). Follow-on implementations targeting CAPI 2.0 and OpenCAPI 3.0 on POWER9 are expected to achieve up to 64Gpops. Our design is available under a permissive open source license at https://github.com/ChenJianyunp/Unum_matrix_multiplier. We hope the current work, which works on CAPI 1.0, along with future community contributions, will help enable a more extensive exploration of this proposed new format.Computer Engineerin

    Benchmarking Apache Arrow Flight - A wire-speed protocol for data transfer, querying and microservices

    No full text
    Moving structured data between different big data frameworks and/or data warehouses/storage systems often cause significant overhead. Most of the time more than 80% of the total time spent in accessing data is elapsed in serialization/de-serialization step. Columnar data formats are gaining popularity in both analytics and transactional databases. Apache Arrow, a unified columnar in-memory data format promises to provide efficient data storage, access, manipulation and transport. In addition, with the introduction of the Arrow Flight communication capabilities, which is built on top of gRPC, Arrow enables high performance data transfer over TCP networks. Arrow Flight allows parallel Arrow RecordBatch transfer over networks in a platform and language-independent way, and offers high performance, parallelism and security based on open-source standards. In this paper, we bring together some recently implemented use cases of Arrow Flight with their benchmarking results. These use cases include bulk Arrow data transfer, querying subsystems and Flight as a microservice integration into different frameworks to show the throughput and scalability results of this protocol. We show that Flight is able to achieve up to 6000 MB/s and 4800 MB/s throughput for DoGet() and DoPut() operations respectively. On Mellanox ConnectX-3 or Connect-IB interconnect nodes Flight can utilize upto 95% of the total available bandwidth. Flight is scalable and can use upto half of the available system cores efficiently for a bidirectional communication. For query systems like Dremio, Flight is order of magnitude faster than ODBC and turbodbc protocols. Arrow Flight based implementation on Dremio performs 20x and 30x better as compared to turbodbc and ODBC connections respectively. We briefly outline some recent Flight based use cases both in big data frameworks like Apache Spark and Dask and remote Arrow data processing tools. We also discuss some limitations and future outlook of Apache Arrow and Arrow Flight as a whole. Computer Engineerin
    corecore